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1.  Introduction 


A  smoother  is  a  procedure  applied  to  bivariate  data  (xj,  yj) . . .  (zn,  yn )  that  pro- 
duces  a  decomposition 


yi  =  s{xi)  +  ri,  i  =  1 . . .  n,  (1) 

where  8  is  a  smooth  function,  often  simply  called  the  smooth,  and  the  r,-  are  residuals. 
It  is  possible  to  formally  define  smoothness,  but  for  our  purposes  an  intuitive  notion  will 
be  sufficient.  Smoothers  are  used  to  summarize  the  association  between  the  predictor 
variable  X  and  the  response  Y.  It  was  pointed  out  by  Cleveland  (1979)  and  is  a  commonly 
held  belief,  that  when  looking  at  a  scatterplot  the  eye  is  distracted  by  the  extreme  points 
in  the  point  cloud,  i.e.,  the  fuzzy  background,  and  tends  to  miss  structure  in  the  bulk 
of  the  data.  Augmentation  of  the  plot  by  a  smooth  is  a  possible  remedy. 

More  formally,  one  can  consider  a  probabilistic  framework  in  which  the  data  are 
an  i.i.d  random  sample  from  some  joint  distribution  X,Y.  One  can  define  an  optimal 
function  /  for  predicting  Y  as  a  function  of  X  that,  minimizes  the  expected  squared 
difference  between  Y  and  /(X).  That  is, 

eX,Y  [Y  -  f{X)]2  =  min  EXy  [Y  -  ?(X)]  (2) 

where  g  ranges  over  all  functions.  The  function  /(X)  is  also  the  transformation  of  X 
that  is  maximally  correlated  to  Y.  The  solution  function  f  is 

Hz)  =  E[Y \X  =  x], 

Smoothers  can  be  regarded  as  procedures  for  estimating  the  conditional  expectation  of 
Y  given  X  =  x.  In  many  cases,  one  imagines  the  joint  distribution  X,Y  to  be  generated 
from  the  process 

Y  =  /(X)  +  c  (3) 

where  /(X)  is  a  smooth  function  and  €  is  an  i.i.d  random  variable  with  zero  expectation. 
Clearly,  E[Y  \  X  =  x]  =  f(x),  so  that  the  smooth  8  can  be  considered  an  estimate 
for  /. 
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Recently,  smoothers  have  found  new  uses  in  multiple  regression  algorithms  (Fried¬ 
man  and  Stuetzle,  1981,  Breiman  and  Friedman,  1984,  Hastie  and  Tibshirani,  1984,  and 
Friedman,  1984).  In  these  procedures,  a  smoother  is  used  as  a  primitive  operation  re¬ 
peatedly  applied  to  varying  projections  of  the  data;  the  quality  of  the  smooth  (2)  is  used 
as  a  figure- of-merit  driving  the  algorithm.  In  such  applications,  the  smoother  must  be 
both  very  flexible  and  rapidly  computable.  This  paper  describes  such  a  smoother,  and 
is,  in  fact,  the  one  currently  in  use  with  all  but  one  of  these  algorithms. 

2.  Basic  Concepts 

Assume  the  data  are  generated  according  to  (3).  We  are  interested  in  procedures  that 
can  approximate  /  arbitrarily  closely,  given  a  dense  enough  sample.  A  straightforward 
estimator  of  a  conditional  expectation  would  be  a  conditional  average 

E[Y  |  *»■)  =  ave{y  |  x,)  =  y,-. 

Although  this  estimate  is  unbiased,  it  can  have  high  variance.  Also,  this  estimate  need 
not  approach  /  as  the  sample  becomes  denser.  A  more  reasonable  estimate  is  based  on 
local  averaging.  Take  s(x,)  to  be  the  average  of  the  responses  y  for  those  observations 
with  predictor  values  x  in  a  neighborhood  Ar,-  of  x,-: 

E(Y  |  x,)  =  s(x,)  =  ave(yj  |  xjeNi).  (4) 

A  critical  parameter  to  be  chosen  is  the  SPAN,  the  size  of  the  neighborhood  over  which 
averaging  takes  place.  It  controls  the  smoothness  of  8.  The  bigger  the  span,  the  smoother 
s  will  be.  To  obtain  consistency,  i.e.,  to  make  sure  that  8  gets  arbitrarily  close  to  f  as  the 
sampling  rate  increases,  one  must  shrink  the  diameter  of  the  neighborhood  in  such  a  way 
that  the  number  of  observations  in  the  neighborhood  still  grows  to  infinity.  Shrinking  the 
neighborhood  makes  the  systematic  or  bias  component  in  the  estimation  error  diminish, 
while  increasing  the  neighborhood  sample  size  guarantees  that  the  variance  component 
of  the  error  goes  to  zero  as  well. 

3.  A  Simple  Nonresistant  Smoother 

With  a  local  averaging  smoother  (4),  the  size  of  the  neighborhood  is  usually  specified 
by  the  span,  the  number  J  of  observations  to  be  included  in  the  averaging.  We  will 
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assume  J  to  be  odd  and  the  abscissas  x,-  to  be  in  increasing  order.  The  neighborhood 
can  be  chosen  cither  symmetrically,  containing  J /2  observations  to  the  left  of  x,-  and 
the  same  number  to  the  right,  or  it  can  be  chosen  to  contain  the  J  nearest  neighbors 
of  X{,  including  X{.  (We  assume  that  7/2  is  computed  by  integer  division.)  There  are 
no  general  results  on  •which  of  these  two  possibilities  is  better.  The  nearest  neighbors 
approach  generalizes  to  higher  dimensions,  but  the  choice  of  a  symmetric  neighborhood 
is  computationally  simpler  in  that  exactly  one  point  enters  and  one  point  leaves  the 
neighborhood  as  one  moves  from  observation  »  to  observation  *  +  1.  We  will,  in  the 
following,  use  symmetric  neighborhoods.  Near  the  boundaries,  it  is,  of  course,  not 
possible  to  keep  N  symmetric.  The  average  (4)  need  not  be  recomputed  every  time. 
It  can  be  updated,  reducing  the  computation  from  nJ  to  n.  Such  updating  can  be 
done  for  all  the  smoothers  we  will  consider,  and  is  highly  desirable  because  in  typical 
applications  J  is  5 %  to  50%  of  n,  and  thus  the  savings  are  substantial. 

The  simple  moving  average  smoother  has  some  serious  shortcomings.  One  disturbing 
property  is  that  it  does  not  reproduce  straight  lines  if  the  abscissa  values  are  not  equi- 
spaced.  Another  disturbing  feature  is  bad  behavior  at  the  boundaries.  If,  for  example, 
the  slope  of  the  underlying  function  /  is  positive  at  the  right  boundary,  the  estimate  for 
observations  close  to  the  boundary  will  be  biased  downwards;  if  the  slope  is  negative,  the 
estimate  is  biased  upwards.  Both  problems  can  be  alleviated  by  fitting  a  least  squares 
straight  line  to  the  observations  in  the  neighborhood  instead  of  fitting  a  constant  (zero 
slope)  and  taking  the  value  of  the  line  at  x,-  as  the  smoothed  value.  (This  keeps  the  bias 
of  the  curve  estimate  strictly  proportional  to  d2f  |  <fx2.)  For  the  computation,  again 
updating  formulas  can  be  used.  The  slope  /?  and  intercept  a  of  the  least  squares  straight 
line  through  a  set  of  points  (xlf  yi) . . .  ( xj ,  yj)  are  given  by 

a  =  yj  -  02j 


with  (5) 

=  E  xj/  J , 

V  J  =  E  Vj/J , 

Cj  =  E(xj  -  £j)(yj  -  Pj), 

Vj  =  E(*i  -  Ijf. 
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When  we  want  to  add  an  observation  (x/+i,  yj+i),  we  can  make  use  of  the  following 
easily  derived  formulas: 

*J+ 1  =  {Jxj  +  Xj+i)/{J  +  1), 

y/+i  =  (J  yj  +  yj+iW  +  *)> 

Cj+ 1  =  Cj  +  ^ {x J+1  -  ij+i){yj+i  -  9j+i), 

Vj+ 1  =  Vj  +  ^(xj+1  -  */+ 1)2. 

Analogous  formulas  can  be  used  for  removal  of  an  observation  from  the  set. 


4.  Choice  of  Span 


The  most  important  choice  in  the  use  of  a  local  averaging  smoother  is  the  choice  of 
the  span  value.  If  the  smoother  is  regarded  as  an  estimator  for  /(x)  (3),  then  the  span 
controls  the  trade  off  between  bias  and  variance  of  the  estimate.  We  illustrate  this  for 
the  case  of  a  simple  moving  average  smoother  (4).  In  this  case,  the  smoothed  value  at 
point  X{  is  given  by 

1  i+JJ2 

«(*.')  =  7  X,  Vj- 
J  i-J/2 

If  we  assume  that  the  errors  f  ,•  are  i.i.d.  with  expected  value  tero  and  variance  a~,  then 
the  expected  squared  error  at  point  x,-  is 


e2(*i  I  J)  =  (/(*()  -  7 


i-J/2 


(6) 


Increasing  the  span  J  will  (if  d2//<fx2  ^  0)  increase  the  first  term,  the  bias  component 
of  the  estimation  error  and  decrease  the  second  term,  the  variance  component;  decreasing 
the  span  will  have  the  opposite  effect.  Stated  more  geometrically,  a  larger  span  makes 
the  smooth  appear  less  wiggly  by  more  strongly  damping  high  frequency  components  of 
the  series  !/,)• 

One  can  estimate  the  optimal  span  value  in  a  particular  situation  as  that  value  that 
minimizes  an  estimate  for 
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Using  the  average  squared  residua!  of  the  data  from  the  smooth 

e2(J)  =  -  £  [Vi  -  s{xi  |  J))2 

n  i-i 

for  this  purpose  is  not  appropriate  since  this  is  always  minimized  by  the  span  value 
/  =  1.  A  better  estimate  is  provided  by  a  method  referred  to  as  “cross-validation”  (M. 
Stone,  1974)  or  “predictive  sample  reuse”  (Geisser,  1975).  Each  observation  is  in  turn 
deleted  and  the  value  of  the  smooth  «(,•)(*,•  |  J)  at  z,  is  calculated  from  the  other  n  —  1 
observations.  The  cross-validated  estimate  of  the  integrated  square  error  is 


&(')=;£  i -of 

n«=i  1  J 


(7) 


Clearly,  E  [e2v]  equals  the  expected  squared  error  obtained  by  applying  the  procedure  to 
a  sample  of  n  —  1  observations  from  the  same  distribution.  The  cross-validated  estimate 
for  the  optimal  span  value  is  taken  to  be  the  value  Jcv  that  minimizes  (7), 


e2cv {Jcv)  =  0<minN  eiv(J). 


Model  selection  through  cross-validation  has  been  remarkably  successful  in  a  wide  variety 
of  situations  (see  M.  Stone,  1974,  Geisser,  1975,  Craven  and  Wahba,  1979,  C.  Stone, 
1981). 

For  the  moving  average  smoothers  discussed  above,  the  cross-validated  residuals 


=  V»  ~  8(i)(xi  I  J) 


are  simply  related  to  the  ordinary  residuals 

r,{ /)  =  y,- I /) 

owing  to  the  fact  that  these  smoothers  are  linear.  A  linear  smoother  is  one  for  which 
the  value  of  the  smooth  for  a  particular  observation  is  a  linear  combination  of  the  y 
values  for  all  of  the  observations,  i.e., 

*(*i  \J)=t,  HijWyj. 

j=l 
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The  linear  combination  Hjj  may  be  different  for  each  observation  *  and  depends  on  J . 
(Note  that  if  Xj  is  not  in  the  neighborhood  of  z,-,  Hjj(J)  =  0.)  For  linear  smoothers, 
the  cross-validated  residual  is  given  by 


For  the  local  straight  line  smoother  discussed  above,  it  is  straightforward  to  calculate 

zj  /  j\ _  *  .  (xi~£j)2 


with  x  j  and  Vj  given  by  (5).  Therefore, 

«« =  [y.-  “  I  J)]2  / 

n  i'=l 


1  - 


1  ( x{  -  xj )212 


Vj 


For  small  to  moderate  changes  in  J,  e%v  (J)  changes  very  little  so  that  it  is  adequate 
to  evaluate  it  for  several  (3  to  5)  discrete  values  of  J  in  the  range  [0  <  /  <  n].  The 
value  of  J  corresponding  to  the  smallest  of  these  e|v  (/)  values  is  then  used.  This  can 
be  accomplished  by  maintaining  several  running  average  smoothers  -  one  for  each  span 
value  •  in  the  pass  over  the  data,  thus  keeping  the  computational  cost  linear  in  n. 


5.  Variable  Span  Smoother 

So  far,  we  have  been  assuming  that  the  (number  of  counts  in  the)  span  remains 
constant  over  the  whole  range  of  predictor  x  values.  This  is  not  optimal  if  either  the 
variance  of  the  random  component  and/or  the  second  derivative  of  the  underlying  func¬ 
tion  /  change  over  the  range  of  predictor  values.  A  local  increase  in  error  variance 
would  call  for  an  increase  in  span,  whereas  an  increase  in  second  derivative  of  /  would 
require  a  decrease.  It  is,  therefore,  desirable  to  allow  the  span  value  to  adapt  to  these 
changing  conditions.  This  requires  that  the  optimal  span  value  be  chosen  locally  rather 
than  using  a  single  global  value. 

More  formally,  one  can  estimate  an  optimal  span  value  for  each  z,  as  well  as  the 
corresponding  optimal  smooth  value,  by  minimizing  an  estimate  for 

e2(S,  /)  =  Ex,r  [V  -  «(X  |  /(X))]2 
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with  respect  to  both  functions  s(x)  and  J(x).  The  resulting  function  s(x)  is  then  taken 
as  our  smooth.  Re-expressing  this  criterion  as 

e2(s,J)  =  ExEy{(r-e(XIJ(X)f\X},  . 

we  see  that  s(x)  (and  /(x))  can  be  found  by  minimizing 

e2(«,  J\x)  =  Ey  !(K  -  >(?  |  J)f  |  *]  (8) 

with  respect  to  s  and  J  for  each  value  of  x.  This  will  result  in  smaller  than  con¬ 
straining  J(x)  to  be  constant.  (This  is  not  necessarily  true  for  the  estimates  however. 
The  decrease  in  bias  associated  with  the  variable  span  may  be  more  than  offset  by  the 
increased  variance  associated  with  estimating  the  additional  function  /(x).) 

As  with  the  constant  span  case,  we  begin  by  applying  the  local  linear  smoother 
several  times  with  several  discrete  values  of  J  in  the  range  0  <  /  <  n.  In  our 
implementation,  we  use  three  values  J  =  0.05n,  0.2 n,  and  0.5n.  These  are  intended 
to  reproduce  the  three  main  parts  of  the  frequency  spectrum  of  /(x)  and  are  referred 
to  as  the  tweeter,  midrange,  and  woofer  smoothers  respectively.  It  is  then  necessary  to 
estimate  (8)  at  each  data  value  x,-  for  each  smoother.  Simply  using  the  cross-validated 
residual 

rM(  j)  =  In  -  *(*,•  I  /)!  /  (i  -  J -  (*‘  («) 

results  in  estimates  with  too  much  variance  since  each  estimate  is  based  on  only  one 
observation.  Better  estimates  can  be  obtained  by  smoothing  rjfj(/)  against  x,-  (with  the 
midrange  smoother)  and  using  the  smoothed  values  as  the  estimates  (s,  J  |  X,-).  For 
stability  reasons,  it  turns  out  to  be  a  little  better  to  smooth  |r^,-j(  J)|  against  x,*  using 
the  resulting  estimates  e(s,J  \  x,)  to  select  the  best  span  value: 

e  (s,  JCv{xi)  |  x,)  =  min  e  (a,  /  |  x,)  (10) 

where  J  takes  on  the  tweeter,  midrange  and  woofer  span  values.  The  smoothed  response 
value  s*(x,-)  at  each  x,-  can  then  be  taken  as  the  smoother  (tweeter,  midrange,  or  woofer) 
value  associated  with  the  optimal  span  estimate 

8  (x,')  —  «(X,-  |  /CT(Xf)). 
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When  obtained  in  this  manner,  the  optima]  span  (and  curve)  estimates  can  have 
unnecessarily  high  variance.  This  is  because  the  estimated  span  value  JCv{x{)  is  not 
constrained  to  vary  smoothly  from  one  observation  to  the  next  (as  ordered  on  £,•).  It  is 
possible  that  two  (or  more)  smoothers  can  have  very  similar  e  values  in  a  region  of  x, 
but  different  values  of  s.  Due  to  variance  in  the  estimates  e(«,  J  |  x,-),  different  span 
(and  curve)  values  can  be  choosen  for  neighboring  x,\  Better  optimal  span  (and  resulting 
curve)  estimates  are  obtained  by  smoothing  the  values  /ci/(*i)  (10)  against  x,-  (again 
with  the  midrange  smoother).  The  result  is  an  estimated  span  for  each  observation  with 
a  value  between  the  tweeter  and  woofer  values.  The  resulting  curve  estimate  is  obtained 
by  interpolating  between  the  two  (out  of  the  three)  smoothers  with  closest  span  values. 

It  is  often  known  (or  suspected)  that  the  underlying  true  curve  /(x)  (3)  is  very 
smooth.  When  this  is,  in  fact,  the  case,  more  accurate  curve  estimates  can  be  obtained 
by  biasing  the  span  selection  procedure  toward  larger  span  values.  Even  when  this  is 
not  the  case,  people  often  find  smoother  curves  more  visually  pleasing  and  are  willing  to 
sacrifice  a  degree  of  accuracy  for  an  estimate  that  is  less  rough.  We,  therefore,  need  a 
method  for  enhancing  the  low  frequency  (bass)  component  of  the  smoother  output.  For 
this  purpose,  we  introduce  a  bass  (tone)  control. 

The  idea  is  to  increase  the  span  value  selected  at  each  x,-  in  inverse  proportion  to 
the  increase  in  predicted-absolute-error  c  associated  with  the  span  increase.  Let  Jcv{xj) 
be  the  estimated  optimal  span  and  Jw  the  woofer  span.  The  span  value  for  each  x,-  is 
taken  to  be 


J{xi)  —  Jcv(Xi)  +  (Jrv  /CT(x,))i?,. 


with 


M,f«'(zl)lxf) 

#.!*<) 


(11) 


Here  0  <  a  <  10  is  a  user  specified  parameter  (tone  control).  The  value  a  =  0  cor¬ 
responds  to  /(x,)  ~  Jcvixi)  (very  little  bass  enhancement)  while  o  =  10  corresponds 
to  J[n)  =  Jw  (maximum  bass).  Values  of  or  between  these  extremes  cause  different 
degrees  of  bass  enhancement.  For  a  given  value  of  a,  the  amount  of  bass  increase  is 
controlled  by  the  ratio  R{.  The  larger  this  ratio,  the  smaller  the  loss  in  increasing  the 
span,  and  thus,  the  more  it  is  increased.  This  tone  control  is  applied  before  the  spans 
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are  smoothed.  Note  that  the  amount  of  bass  enhancement  is  highly  nonlinear  in  the 
parameter  a.  Increases  for  small  values  of  a  have  much  less  effect  than  the  same  sized 
increases  at  larger  Ce  values.  Figure  1  shows  the  amount  of  bass  enhancement  as  a  func¬ 
tion  of  R{  for  several  values  of  a. 

The  resulting  variable  span  smoother  makes  nine  passes  over  the  data: 

1.  Primary  data  smooths  with  tweeter,  midrange,  and  woofer  spans. 

2.  Smooth  cross- validated  absolute  residuals  (9)  for  each  of  the  primary  smooths 
with  midrange  span. 

3.  Select  best  span  as  minimizing  the  output  of  Step  2  for  each  observation.  (Apply 
low  frequency,  bass  enhancement  if  desired.) 

4.  Smooth  best  span  estimates  with  midrange  span. 

5.  Use  smoothed  span  estimates  to  interpolate  between  primary  smoother  values. 

It  is  important  to  note  that  using  cross-validated  residuals  as  a  basis  for  choosing 
span  value  is  highly  sensitive  to  lack  of  independence  among  the  €,•  (3)  as  ordered  on 
x.  If  there  is  a  large  positive  (negative)  correlation  among  observations  with  similar  x 
values,  substantial  under  (over)  estimates  will  result.  In  situations  where  a  high  degree 
of  auto-correlation  is  suspected,  these  span  selection  procedures  should  be  used  with 
caution. 


6.  An  Example 

In  this  section,  we  present  a  simulated  example  intended  to  illustrate  a  situation 
where  variable  span  is  important.  The  data  for  this  example  consist  of  n  =  200  pairs 
(x,*,  y,-)  with  the  x ,•  drawn  randomly  (i.i.d)  from  a  uniform  distribution  in  the  interval 
(0,1).  The  y,-  are  obtained  from 

y»  =  «»n(2x(l  —  x,)2)  +  Xf£i  (12) 

with  the  Cj  i.i.d  standard  normal.  This  example  simulates  a  situation  in  which  the 
curvature  of  /  decreases  and  the  variance  of  the  random  component  increases  with 
increasing  x.  In  the  first  set  of  examples,  no  bass  enhancement  was  used.  Figure  2a  shows 
a  scatterplot  of  these  data  with  the  resulting  variable  span  smooth  s(z)  superimposed. 
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Figure  2b  shows  the  individual  tweeter,  midrange,  and  woofer  smooths.  Figure  2c  shows 
the  estimated  optimal  span  /(z)  =  Jcv{%)  88  a  function  of  z. 

In  the  low  noise  high  curvature  region  (z  <  0.2),  the  tweeter  span  is  selected.  In 
the  high  noise  low  curvature  region  z  >  0.8,  the  span  increases  rapidly  to  the  woofer 
value.  In  the  region  where  both  curvature  and  noise  are  moderate,  the  selected  span 
averages  just  below  the  midrange  value.  The  resulting  composite  smooth  s(z)  (Fig.  2a) 
is  seen  to  be  much  better  than  any  of  the  individual  (tweeter,  midrange,  or  woofer) 
smooths  (Fig.  2b). 

In  order  to  see  to  what  extent  these  results  reflect  general  behavior,  1000  data  sets 
were  generated,  all  with  identical  set  of  z,-,  but  each  with  a  different  random  set  The 
y.  were  constructed  as  in  (12).  Figure  2d  shows  the  estimated  optimal  span  function 
J{x)  averaged  over  these  1000  runs.  This  7  (z)  reflects  similar  behavior  to  that  of  the 
first  run,  7(z).  The  span  is  seen  to  rise  a  bit  more  rapidly  in  the  region  of  middle  x 
values,  but  not  to  as  high  a  value  for  large  z.  Figure  2e  shows  the  average  accuracy  of 
the  composite  smooth,  as  well  as  each  of  the  three  primary  smooths,  as  a  function  of  x. 
The  absolute  error 

e(z,)  =  |s(z,)  -  sin  [2x(l  -  z,-)2]| 

was  averaged  over  the  1000  runs  for  each  z,\  (The  points  for  each  smoother  are  connected 
by  straight  lines.)  The  composite  variable  span  smooth  is  again  seen  to  be  much  better 
than  any  of  the  three  constant  span  primary  smooths.  It  incurs  none  of  the  (very  large) 
bias  associated  with  the  midrange  and  woofer  spans  for  low  z  values,  and  its  absolute 
error  is  about  one-half  that  of  the  tweeter  for  the  larger  z  values.  Over  the  entire  range 
of  z  values,  the  variable  span  smoother  has  performance  comparable  to  the  best  of  the 
primary  smoothers  at  each  z  value.  Only  for  the  very  largest  z  values  (z  >  0.7),  the 
woofer  smoother  incurs  about  20%  less  error.  Figure  2e  also  illustrates  the  problems 
associated  with  end  effects.  The  average  error  for  points  near  the  very  edges  of  the  x 
interval  is  about  twice  that  for  close-by  interior  points. 

Figures  3a-3e  show  the  corresponding  results  for  data  generated  as  above  but  with 
n  =  100.  The  results  for  this  smaller  sample  siie  reflect  the  same  general  behavior 
described  above.  The  average  absolute  error  is  somewhat  higher,  especially  in  the  high 
variance  (large  z)  region. 
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Figure  4  a  shows  the  same  data  as  that  of  Figure  2a,  but  the  superimposed  smooth 
is  the  result  of  applying  some  bass  enhancement,  o  =  5  (11).  The  result  is  visually 
more  pleasing  in  that  it  is  less  wiggly  in  the  high  variance  region  ( x  >  0.5).  There 
appears  to  be  an  increase  in  bias,  however,  in  that  the  curve  seems  to  lie  above  the  data 
near  x  =  0.1  and  undershoot  the  data  near  X  =  0.5.  These  suspicions  are  verified  in 
Figure  4b  where  the  average  absolute  error  (over  1000  runs)  of  the  composite  variable 
span  smoother,  as  well  as  the  three  primary  smoothers,  are  shown.  Although  the  error 
is  reduced  to  that  of  the  woofer  for  x  >  0.6,  it  is  dramatically  increased  in  the  high 
curvature  regions  0.05  <  x  <  0.20  and  0.35  <  x  <  0.60.  Figure  4c  shows  the 
average  span  function  7 (ar).  Except  for  the  very  low  noise  high  curvature  region  (a:  < 
0.1),  the  selected  span  value  is  generally  larger  than  the  estimated  optimal  span  Jcv(x) 
(Figure  2d). 

This  example  was  deliberately  constructed  to  be  difficult  and  to  test  the  variable 
span  aspect  of  the  smoothing  procedure.  It  shows  that  the  method  can  readily  adapt  to 
changing  circumstances  (function  curvature  and/or  error  variance).  Not  all  situations 
encountered  in  practice  are  this  dramatic  and  in  less  dramatic  situations  the  gain  using 
variable  span  will  be  correspondingly  less.  In  some  settings,  the  additional  variance 
encountered  in  estimating  the  two  functions  s(x)  and  J(x)  can  more  than  offset  the  de¬ 
crease  in  bias  so  that  using  an  optimally  estimated  constant  span  will  incur  less  absolute 
error.  This  becomes  more  likely  for  small  sample  sizes  (n  <  40).  Even  in  these  cases, 
however,  the  variable  span  smoother  is  usually  almost  as  good  as  the  best  single  span 
smoother,  especially  if  some  bass  enhancement  is  employed. 

7.  Discussion 

Cleveland  (1979)  suggested  a  smoother  also  based  on  local  linear  fits.  It  differs  from 
the  one  described  in  this  report  mainly  in  three  respects: 

—  It  does  not  automatically  choose  the  span  by  cross-validation. 

—  It  does  not  use  variable  span. 

—  In  the  fit  of  the  local  straight  line  determining  the  smooth  «(z,)  for  predictor  value 
2t-,  the  observations  are  weighted  according  to  their  distance  from  observa¬ 
tions  towards  the  extremes  of  the  span  receive  lower  weights  than  observations 
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with  predictor  values  close  to  x,\  Asymptotic  calculations  suggest  that  assign¬ 
ing  unequal  -weights  should  reduce  the  error  of  the  curve  estimate,  but  there  is 
no  evidence  that  it  makes  a  substantial  difference  for  sample  sizes  occurring  in 
practice.  It  does,  however,  produce  a  smoother  looking  estimate. 

Updating  formulas  cannot  be  used  in  this  scheme,  making  it  comparatively  expen¬ 
sive  in  terms  of  computing.  To  reduce  computation,  Cleveland  suggests  evaluating  the 
smooth  only  for  every  (£  <  <  n)  predictor  value.  The  smoothing  procedure  described  in 
this  report  was  developed  because  the  best  span  value  is  usually  not  known  in  advance, 
a  variable  span  is  often  important,  and  because  the  use  of  updating  formulas  dramati¬ 
cally  reduces  computation.  This  is  critical  when  the  smoother  is  repeatedly  applied  as 
a  primitive  operation  in  more  complicated  algorithms. 

Another  class  of  procedures  suggested  for  smoothing  are  based  on  splines.  A  spline 
function  s  of  order  £  with  knots  at  Z\  . . .  2*  is  a  function  satisfying  the  following  two 
conditions: 

_  ln  each  of  the  intervals  (-oc,  z{),  (2lf  22)  •  •  •  **)»  (*fc»  °°)»  «  «  a  polyno¬ 

mial  of  degree  £  —  1; 

_  s  has  £  —  2  continuous  derivatives. 

One  way  to  use  spline  functions  in  smoothing  is  to  fit  a  spline  function  with  knots 
Z\---Zk  to  the  data  (*1)  yi)  •  •  •  [xn,  Vn),  either  by  least  squares  or  by  some  resistant 
method.  The  degree  of  smoothness  is  determined  by  the  number  and  position  of  the 
knots.  A  major  disadvantage  of  this  method  is  that  k  +  1  parameters  must  be  chosen: 
the  number  and  the  positions  of  the  knots.  Usually  some  heuristic  procedure  is  used 
to  place  the  knots  once  k  has  been  fixed  (Jupp,  1978).  This  leaves  the  number  of  knots 
to  be  determined.  This  number  plays  the  role  of  the  span  in  determining  the  degree 
of  smoothing.  Unfortunately,  the  output  of  the  smoother  can  depend  on  k  in  a  very- 
nonlinear  way;  it  is  easy  to  construct  examples  where  the  addition  of  one  more  knot 
substantially  decreases  the  residual  sum  of  squares,  whereas  further  knots  hardly  make 
any  difference.  This  makes  k  more  difficult  to  choose  than  the  span  in  a  local  averag¬ 
ing  smoother.  Furthermore,  least  squares  fit  of  splines  is  substantially  slower  so  that 
choosing  k  through  cross-validation  is  usually  too  expensive. 

Another  way  is  to  use  smoothing  splines  in  the  sense  of  Reinsch  (1967).  A  smoothing 
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spline  s  of  order  2 1  for  smoothing  parameter  X  is  the  function  that  minimizes 

£  (vi  -  f(xi))2  +  x  /  *" 

J  X\ 

among  all  functions  /  with  t  derivatives.  The  solution  turns  out  to  be  a  spline  function 
of  order  2 1  with  knots  xj . . .  xn;  the  name  is  thus  justified.  The  larger  X  is  chosen, 
the  smoother  8  becomes;  thus,  X  here  plays  the  role  of  the  span.  Computation  of  the 
spline  for  given  X  requires  the  solution  of  a  banded  n  *  n  linear  system.  A  drawback 
of  the  method,  as  described  here,  is  that  it  is  impossible  to  obtain  an  intuitive  feeling 
for  the  choice  of  X  in  a  given  example.  So,  one  usually  fixes  not  X,  but  the  residual 
sum  of  squares  around  the  smooth.  The  corresponding  value  of  X  then  has  to  be  found 
iteratively  by  repeatedly  solving  the  minimization  problem.  This  substantially  increases 
the  necessary  amount  of  computation.  Algorithms  to  determine  the  optimal  X  by  cross- 
validation  usually  require  computation  of  the  singular  value  decomposition  of  an  n  *  n 
matrix;  they  are  expensive  and  infeasible  for  sample  sizes  larger  than  200-300.  An 
approximate  method  has  recently  been  proposed  (Silverman,  1984),  however,  that  is 
much  faster,  thereby  extending  the  use  of  smoothing  splines  to  larger  samples. 

To  summarize,  the  local  averaging  smoother  described  in  this  report  has  two  desir¬ 
able  properties  that  set  it  apart  from  other  smoothers:  it  is  both  very  fast  to  compute 
and  the  value  of  the  parameter  that  controls  the  amount  of  smoothing  is  automatically 
optimized  locally  (through  cross-validation),  allowing  it  to  adapt  to  the  response  func¬ 
tion  over  the  range  of  predictor  values.  Listing  of  a  FORTRAN  program  implementing 
the  procedure  described  herein  is  available  from  the  author. 
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FIGURE  CAPTIONS 


Figure  1: 

Figure  2a: 
Figure  2b: 
Figure  2c: 
Figure  2d: 
Figure  2e: 

Figure  3a: 


Bass  amplification  factor  as  a  function  predictive-absolute-error 
ratio  for  various  tone  control  settings. 

Scatterplot  of  data  with  composite  smooth  superimposed. 

Individual  tweeter,  midrange  and  woofer  smooths. 

Selected  span  Jcv(x). 

Expected  estimated  optimal  span  7ct,(x). 

Expected  absolute  error  of  three  primary  smooths  and-  composite 
variable  span  smooth. 

Scatterplot  of  data  with  composite  smooth  superimposed. 


Figure  3b: 
Figure  3c: 
Figure  3d: 
Figure  3e: 

Figure  4a: 
Figure  4b: 

Figure  4c: 


Individual  tweeter,  midrange  and  woofer  smooths. 

Selected  span  Jcv{%)- 

Expected  estimated  optimal  span  7 cv{x)- 

Expected  absolute  error  of  three  primary  smooths  and  composite 
variable  span  smooth. 

Scatterplot  of  data  with  composite  smooth  superimposed. 

Expected  absolute  error  of  three  primary  smooths  and  composite 
variable  span  smooth. 

Expected  chosen  span  7  (x). 
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